Day26: 實作：從深度圖到 3D 點雲

2024 iThome 鐵人賽

DAY 27

AI/ ML & Data

3D 重建實戰：使用 2D 圖片做相機姿態估計與三維空間重建系列第 27 篇

16th鐵人賽

幕村琉德滑

團隊天堂製造

2024-10-11 23:08:27

102 瀏覽

分享至

有了每個像素的深度值，我們就可以將這些深度值轉換成 3D 空間中的點，這個過程相當於把每個像素反向的投影到 3D 空間中，這樣我們就可以得到一個 3D 點雲（Point Cloud）。

以下將示範如何做到這點，首先把 RGB 圖片與深度圖讀取進來：

import sys
import numpy as np
import cv2
from vispy import app, scene
from dataset import TUMRGBD     # Our dataset class


# Create canvas
canvas = scene.SceneCanvas(title="Back projection", keys="interactive", show=True)
# Make color white
canvas.bgcolor = "white"

# Create view and set the viewing camera
view = canvas.central_widget.add_view()
view.camera = "turntable"
view.camera.fov = 50
view.camera.distance = 10

def main():
    dataset = TUMRGBD("data/rgbd_dataset_freiburg2_desk")

    frames = []    
    # Get the first two valid frames
    for i in range(0, len(dataset), 100):
        x = dataset[i]
        if x is None:
            continue
        frames.append(x)
        if len(frames) == 2:
            break
        
    rgb = cv2.imread(frames[0]["rgb_path"])
    rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)
    depth = cv2.imread(frames[0]["depth_path"], cv2.IMREAD_UNCHANGED)    
    depth = depth.astype(np.float32) / 5000.0
    K = dataset.intrinsic_matrix()
    
    points, colors = convert_point_cloud(rgb, depth, K)

然後實作出 convert_point_cloud 函數，這個函數的目的是將 RGB 圖片與深度圖轉換成 3D 點雲，這裡會使用到相機的內部參數 K：

def convert_point_cloud(rgb, depth, K):
    H, W = depth.shape
    fx = K[0, 0]
    fy = K[1, 1]
    cx = K[0, 2]
    cy = K[1, 2]
    
    x, y = np.meshgrid(np.arange(W), np.arange(H), indexing="xy")
    x = x.flatten()
    y = y.flatten()
    
    # Get the values
    rgb = rgb[y, x, :]
    depth = depth[y, x]

    # Remove invalid depth values (depth == 0)
    mask = depth > 0
    x = x[mask]
    y = y[mask]
    depth = depth[mask]
    rgb = rgb[mask, :]
    
    # Forward project: x = (X * fx / Z) + cx
    # Inverse project: X = (x - cx) * depth / fx
    X = (x - cx) * depth / fx
    Y = (y - cy) * depth / fy
    Z = depth
    points = np.stack([X, Y, Z], axis=-1)
    return points, rgb

以下是這個函示的解釋：

numpy可以利用 np.meshgrid 得到一個網格，其實就是圖片中每個相素的座標 x 與 y。
depth[y, x] 可以得到每個像素的深度值，要注意這裡 y 與 x 是反過來的，因為 y 對應的是圖片的高度，而 x 對應的是圖片的寬度。
mask = depth > 0 過濾掉無效的深度值，上一篇文中有提到，有些像素可能沒有深度值。
原本的投影公式是 x = (X * fx / Z) + cx，這裡我們反過來，得到 X = (x - cx) * depth / fx，這樣就可以得到每個像素的 3D 座標。
最後將 X, Y, Z 這三個座標組合成一個 points，這樣就得到了 3D 點雲，而 rgb 則是每個點的顏色。

接下來可以使用 vispy 來顯示這個 3D 點雲，我們將點的數量減少 10 倍，免得太多點：

def create_point_cloud(points, colors, radius=2.0, parent=None):
    colors = colors.astype(np.float32) / 255.0
    point_cloud = scene.visuals.Markers()
    point_cloud.set_data(points, face_color=colors, edge_width=0.0, size=radius)
    point_cloud.parent = parent

points = points[::10, :]
colors = colors[::10]
create_point_cloud(points, colors, parent=view.scene)

會得到以下的結果：
pcd

也可以搭配上先前的相機姿態視覺化，可以看到相機與點雲的對應來驗證正確性：

rgb = cv2.cvtColor(rgb, cv2.COLOR_RGB2RGBA)
rgb[:, :, 3] = 128
create_frustum_with_image(
    rgb,
    np.eye(4),
    color="blue",
    axis=True,
)

pcd

這裡省略了座標轉換的部分，讀者也可以將相機的姿態加上去，同時也要加在點雲上。

Day25: 深度圖（二），深度圖是怎麼來的呢？

Day27: 利用深度圖做姿態估計與 3D 重建

系列文

3D 重建實戰：使用 2D 圖片做相機姿態估計與三維空間重建共 30 篇

RSS系列文訂閱系列文

1 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22195 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

3D 重建實戰：使用 2D 圖片做相機姿態估計與三維空間重建系列 第 27 篇